Download A new approach to transient processing in the phase vocoder
In this paper we propose a new method to reduce phase vocoder artifacts during attack transients. In contrast to all transient preservation algorithms that have been proposed up to now the new approach does not impose any constraints on the time dilation parameter for processing transient segments. By means of an investigation into the spectral properties of attack transients of simple sinusoids we provide new insights into the causes of phase vocoder artifacts and propose a new method for transient preservation as well as a new criterion and a new algorithm for transient detection. Both, the transient detection and the transient processing algorithms are designed to operate on the level of spectral bins which reduces possible artifacts in stationary signal components that are close to the spectral peaks classified as transient. The transient detection criterion has a close relation to the transient position and allows us to find an optimal position for reinitializing the phase spectrum. The evaluation of the transient detector by means of a hand labeled data base demonstrates its superior performance compared to a previously published algorithm. Attack transients in sound signals transformed with the new algorithm achieves high quality even if strong dilation is applied to polyphonic signals.
Download A New Score Function for Joint Evaluation of Multiple F0 Hypotheses
This article is concerned with the estimation of the fundamental frequencies of the quasiharmonic sources in polyphonic signals for the case that the number of sources is known. We propose a new method for jointly evaluating multiple F0 hypotheses based on three physical principles: harmonicity, spectral smoothness and synchronous amplitude evolution within a single source. Given the observed spectrum a set of F0 candidates is listed and for any hypothetical combination among the candidates the corresponding hypothetical partial sequences are derived. Hypothetical partial sequences are then evaluated using a score function formulating the guiding principles in mathematical forms. The algorithm has been tested on a large collection of arti cially mixed polyphonic samples and the encouraging results demonstrate the competitive performance of the proposed method.
Download Efficient spectral envelope estimation and its application to pitch shifting and envelope preservation
In this article the estimation of the spectral envelope of sound signals is addressed. The intended application for the developed algorithm is pitch shifting with preservation of the spectral envelope in the phase vocoder. As a first step the different existing envelope estimation algorithms are investigated and their specific properties discussed. As the most promising algorithm the cepstrum based iterative true envelope estimator is selected. By means of controlled sub-sampling of the log amplitude spectrum and by means of a simple step size control for the iterative algorithm the run time of the algorithm can be decreased by a factor of 2.5-11. As a remedy for the ringing effects in the the spectral envelope that are due to the rectangular filter used for spectral smoothing we propose the use of a Hamming window as smoothing filter. The resulting implementation of the algorithm has slightly increased computational complexity compared to the standard LPC algorithm but offers significantly improved control over the envelope characteristics. The application of the true envelope estimator in a pitch shifting application is investigated. The main problems for pitch shifting with envelope preservation in a phase vocoder are identified and a simple yet efficient remedy is proposed.
Download Adaptive Noise Level Estimation
We describe a novel algorithm for the estimation of the colored noise level in audio signals with mixed noise and sinusoidal components. The noise envelope model is based on the assumptions that the envelope varies slowly with frequency and that the magnitudes of the noise peaks obey a Rayleigh distribution. Our method is an extension of a recently proposed approach of spectral peak classification of sinusoids and noise, which takes into account a noise envelope model to improve the detection of sinusoidal peaks. By means of iterative evaluation and adaptation of the noise envelope model, the classification of noise and sinusoidal peaks is iteratively refined until the detected noise peaks are coherently explained by the noise envelope model. Testing examples of estimating white noise and colored noise are demonstrated.
Download Adaptive Threshold Determination for Spectral Peak Classification
A new approach to adaptive threshold selection for classification of peaks of audio spectra is presented. We here extend the previous work on classification of sinusoidal and noise peaks based on a set of spectral peak descriptors in a twofold way: on one hand we propose a compact sinusoidal model where all the modulation parameters are defined with respect to the analysis window. This fact is of great importance as we recall that the STFT spectra are closely related to the analysis window properties. On the other hand, we design a threshold selection algorithm that allows us to control the decision thresholds in an intuitive manner. The decision thresholds calculated from the relationships established between the noise power in the signal and the distributions of sinusoidal peaks assures that all peaks described as sinusoidal will be correctly classified. We also show that the threshold selection algorithm can be used for different types of analysis windows with only a slight parameter readjustment.
Download Frequency Slope Estimation and its Application for Non-Stationary Sinusoidal Parameter Estimation
In the following paper we investigate into the estimation of sinusoidal parameters for sinusoids with linear AM/FM modulation. It will be shown that for linear amplitude and frequency modulation only the frequency modulation creates additional estimation bias for the standard sinusoidal parameter estimator. Then an enhanced algorithm for frequency domain demodulation of spectral peaks is proposed that can be used to obtain an approximate maximum likelihood estimate of the frequency slope, and an estimate of the amplitude, phase and frequency parameter with significantly reduced bias. An experimental evaluation compares the new estimation scheme with previously existing methods. It shows that significant bias reduction is achieved for a large range of slopes and zero padding factors. A real world example demonstrates that the enhanced bias reduction algorithm can achieve a reduction of the residual energy of up to 9dB.
Download A Source-Filter Model for Quasi-Harmonic Instruments
In this paper we propose a new method for a generalized model representing the time-varying spectral characteristics of quasi harmonic instruments. This approach comprises a linear sourcefilter model, a parameter estimation method and a model evaluation based on the prototype’s variance. The source-filter-model is composed of an excitation source generating sinusoidal parameter trajectories and a modeling resonance filter, whereas basic-splines (B-Splines) are used to model continuous trajectories. To estimate the model parameters we apply a gradient decent method to a training database and the prototype’s variance is being estimated on a test database. Such a model could later be used as a priori knowledge for polyphonic instrument recognition, polyphonic transcription and source separation algorithms as well as for resynthesis.
Download A Segmental Spectro-Temporal Model of Musical Timbre
We propose a new statistical model of musical timbre that handles the different segments of the temporal envelope (attack, sustain and release) separately in order to account for their different spectral and temporal behaviors. The model is based on a reduced-dimensionality representation of the spectro-temporal envelope. Temporal coefficients corresponding to the attack and release segments are subjected to explicit trajectory modeling based on a non-stationary Gaussian Process. Coefficients corresponding to the sustain phase are modeled as a multivariate Gaussian. A compound similarity measure associated with the segmental model is proposed and successfully tested in instrument classification experiments. Apart from its use in a statistical framework, the modeling method allows intuitive and informative visualizations of the characteristics of musical timbre.
Download Between Physics and Perception: Signal Models for High Level Audio Processing
The use of signal models is one of the key factors enabling us to establish high quality signal transformation algorithms with intuitive high level control parameters. In the present article we will discuss signal models, and the signal transformation algorithms that are based on these models, in relation to the physical properties of the sound source and the properties of human sound perception. We will argue that the implementation of perceptually intuitive high quality signal transformation algorithms requires strong links between the signal models and the perceptually relevant physical properties of the sound source. We will present an overview over the history of 2 sound models that are used for sound transformation and will show how the past and future evolution of sound transformation algorithms is driven by our understanding of the physical world.
Download A Shape-Invariant Phase Vocoder for Speech Transformation
This paper proposes a new method for shape invariant realtime modification of speech signals. The method can be understood as a frequency domain SOLA algorithm that is using the phase vocoder algorithm for phase synchronization. Compared to time domain SOLA the new implementation provides improved time synchronization during overlap add and improved quality of the noise components of the transformed speech signals. The algorithm has been compared in two perceptual tests with recent implementations of PSOLA and HNM algorithms demonstrating a very satisfying performance. Due to the fact that the quality of transformed signals stays constant over a wide range of transformation parameters the algorithm is well suited for real-time gender and age transformations.